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(57) A method of determining a pitch period of an 
audio signal using a correlation-based signal derived 
from the audio signal. The correlation-based signal in- 
cludes known peaks each corresponding to a respective 
one of known time lags. The known peaks includes a 
global maximum peak. The method comprises: (a) de- 
termining if a candidate peak among the local peaks ex- 
ceeds a peak threshold; (b) determining if a candidate 
time lag corresponding to the candidate peak is within 
a predetermined range of at least one integer sub-mul- 
tiple of the time lag corresponding to the global maxi- 
mum peak; and (c) setting the pitch period equal to the 
candidate time lag when the determinations of both 
steps (a) and (b) are true. 
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Description 

BACKGROUND OF THE INVENTION 

5 Field of the Invention 

[0001] This invention relates generally to digital communications; and more particularly, to digital coding (or com- 
pression) of speech and/or audio signals. 

w Related Art 

[0002] In the field of speech coding, the most popular encoding method is predictive coding. Most of the popular 
predictive speech coding schemes, such as Multi-Pulse Linear Predictive Coding (MPLPC) and Code-Excited Linear 
Prediction (CELP), use two kinds of prediction. The first kind, called short-term prediction, exploits the correlation 

is between adjacent speech samples. The second kind, called long-term prediction, exploits the correlation between 
speech samples at a much greater distance. Voiced speech signal waveforms are nearly periodic if examined in a local 
scale of 20 to 30 ms. The period of such a locally periodic speech waveform is called the pitch period. When the speech 
waveform is nearly periodic, each speech sample is fairly predictable from speech samples roughly one pitch period 
earlier. The long-term prediction in most predictive speech coding systems exploits such pitch periodicity. Obtaining 

20 an accurate estimate of the pitch period at each update instant is often critical to the performance of the long-term 
predictor and the overall predictive coding system. 

[0003] A straightforward prior-art approach for extracting the pitch period is to identify the time lag corresponding to 
the largest correlation or normalized correlation values for time lags in the target pitch period range. However, the 
resulting computational complexity can be quite high. Furthermore, a common problem is the estimated pitch period 
25 produced this way is often an integer multiple of the true pitch period. 

[0004] A common way to combat the complexity issue is to decimate the speech signal, and then do the correlation 
peak-picking in the decimated signal domain. However, the reduced time resolution and audio bandwidth of the deci- 
mated signal can sometimes cause problems in pitch extraction. 

[0005] A common way to combat the multiple-pitch problem is to buffer more pitch period estimates at "future" update 
30 instants, and then attempt to smooth out multiple pitch period by the so-called "backward tracking". However, this 
increases the signal delay through the system. 

BRIEF SUMMARY OF THE INVENTION 

35 [0006] The present invention aims to alleviate at least some of the disadvantages described above. According to 
one aspect, the present invention aims to achieve low complexity using signal decimation, but it attempts to preserve 
more time resolution by interpolating around each correlation peak. According to another aspect, the present invention 
aims to eliminate nearly alt of the occurrences of multiple pitch period using novel decision logic, without buffering 
future pitch period estimates. Thus, it aims to achieve good pitch extraction performance with low complexity and low 

40 delay. 

[0007] Different aspects of the invention are set out in the appended claims. 

[0008] According to one embodiment of the present invention, the following procedure is used to extract the pitch 
period from the speech signal. First, the speech signal is passed through a filter that reduces formant peaks relative 
to the spectral valleys. A good example of such a filter is the perceptual weighting filter used in CELP coders. Second, 

45 the filtered speech signal is properly low-pass filtered and decimated to a lower sampling rate. Third, a "coarse pitch 
period" is extracted from this decimated signal, using quadratic interpolation of normalized correlation peaks and elab- 
orate decision logic. Fourth, the coarse pitch period is mapped to the time resolution of the original undecimated signal, 
and a second-stage pitch refinement search is performed in the neighborhood of the mapped coarse pitch period, by 
maximizing normalized correlation in the undecimated signal domain. The resulting refined pitch period is the final 

50 output pitch period. 

[0009] The first contribution to this embodiment is the use of a quadratic interpolation method around the local peaks 
of the correlation function of the decimated signal, the method being based on a search procedure that eliminates the 
need of any division operation. Such quadratic interpolation improves the time resolution of the correlation function of 
the decimated signal, and therefore improves the performance of pitch extraction, without incurring the high complexity 
55 of full correlation peak search in the original (undecimated) signal domain. 

[0010] The second contribution to this embodiment is a decision logic that searches through a certain pitch range in 
the decimated signal domain, and identifies the smallest time lag where there is a large enough local peak of correlation 
near every one of its integer multiples within a certain range, and where the threshold for determining whether a local 
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correlation peak is large enough is a function of the integer multiple. 

[001 1] The third contribution to this embodiment is a decision logic that involves finding the time lag of the maximum 
interpolated correlation peak around the last coarse pitch period, and determining whether it should be accepted as 
the output coarse pitch period using different correlation thresholds, depending on whether the candidate time lag is 
greater than the time lag of the global maximum interpolated correlation peak or not. 

[0012] The fourth contribution to this embodiment is a decision logic that insists that if the time lag of the maximum 
interpolated correlation peak around the last coarse pitch period is less than the time lag of the global maximum inter- 
polated correlation peak and is also less than half of the maximum allowed coarse pitch period ! then it can be chosen 
as the output coarse pitch period only if the time lag of the global maximum correlation peak is near an integer multiple 
of it, where the integer is one of 2, 3, 4, or 5. 

[0013] Another embodiment of the present invention includes a method of determining a pitch period of an audio 
signal using a correlation-based signal derived from the audio signal. The correlation-based signal includes known 
peaks each corresponding to a respective one of known time lags. The known peaks include a global maximum peak. 
The method comprises: (a) determining if a candidate peak among the local peaks exceeds a peak threshold; (b) 
determining if a candidate time lag corresponding to the candidate peak is within a predetermined range of at least 
one integer sub-multiple of the time lag corresponding to the global maximum peak; and (c) setting the pitch period 
equal to the candidate time lag when the determinations of both steps (a) and (b) are true. 

[0014] A further embodiment includes another method of determining a pitch period of an audio signal using a cor- 
relation-based signal derived from the audio signal. The correlation-based signal including known peaks at correspond- 
ing known time lags. The second embodiment comprises; (a) searching the correlation-based signal for a first time lag 
corresponding to a global maximum interpolated peak of the correlation-based signal: (b) searching the correlation- 
based signal for a maximum interpolated peak corresponding to a second time lag within a predetermined time lag 
range of a previously determined pitch period of the audio signal; (c) searching the correlation-based signal for a third 
time lag; and (d) selecting as a time lag indicative of the pitch period a preferred one of the first time lag if found in step 
(a), the second time lag if found in step (b), and the third time lag if found in step (c). Steps (a), (b) : (c) and (d) of this 
second embodiment may be performed in accordance with at least portions of respective example Algorithms A1, A2, 
A3 and A4, described in detail below. 

[0015] Further embodiments, features, and advantages of the present invention, as well as the structure and oper- 
ation of the various embodiments of the present invention, are described in detail below with reference to the accom- 
panying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS/ FIGURES 

[0016] The accompanying drawings, which are incorporated herein and form a part of the specification, illustrate the 
present invention and, together with the description, further serve to explain the principles of the invention and to enable 
a person skilled in the pertinent art to make and use the invention. In the drawings, like reference numbers indicate 
identical or functionally similar elements. The terms "algorithm" and "method" as used herein have equivalent meanings, 
and may be used interchangeably. 

[0017] FIG. 1 is a block diagram of an example pitch extractor. 

[0018] FIG. 2 is a flow chart of an example first-phase coarse pitch period searcher/determiner method performed 
by a portion of the pitch extractor of FIG. 1 . 

[0019] FIG. 3 is an example Results Table produced by preliminary method steps in the method of FIG. 2. 
[0020] FIG. 4 is a plot of an example correlation-based signal, such as an NCS signal. 
[0021] FIG. 5 is an example Results Table produced by the method of FIG. 2. 

[0022] FIG. 6 is a plot of an example NCS signal including interpolated NCS values near NCS local peaks. 
[0023] FIG. 7 is a flowchart of an example method corresponding generally to an example pitch extraction algorithm, 
Algorithm A1. 

[0024] FIG. 8 is a flowchart of an example method corresponding generally to an example pitch extraction algorithm, 
Algorithm A2. 

[0025] FIG. 9 is a flowchart of an example method corresponding generally to an example pitch extraction algorithm, 
Algorithm A3. 

[0026] FIG. 1 0 is an example plot of portions of an NCS signal useful for describing portions of Algorithm A3. 
[0027] FIGs. 1 1 A and 1 1 B are flowcharts that collectively represent an example method corresponding to an example 
pitch extraction algorithm, Algorithm A4. 

[0028] FIG. 1 1 C is a plot of correlation-based magnitude against time lag which serves as an illustration of Algorithm 
/Wand a portion of the method of FIGs. 11 A and 11 B. 

[0029] FIG. 12 is a flowchart of an example method, according to an alternative, generalized embodiment of the 
present invention. 
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[0030] FIG. 1 3 is a plot of a correlation-based signal 1300 representative of either a decimated or a non-decimated 
correlation-based signal. 

[0031] FIG. 14 is a flowchart of a generalized method representative of a portion of Algorithm A4. 

[0032] FIG. 15 is a block diagram of an example system/apparatus for performing one or more of the methods of 
5 the present invention. 

[0033] FIG. 16 is a block diagram of an example arrangement of a module of the system of FIG. 15. 

[0034] FIG. 1 7 is a block diagram of an example arrangement of another module of the system of FIG. 15. 

[0035] FIG. 18 is an example arrangement of another module of the system of FIG. 15. 

[0036] FIG. 19 is a block diagram of an example arrangement of another module of the system of FIG. 15. 
10 [0037] FIG. 20 is a block diagram of a computer system on which embodiments of the present invention may operate. 

DETAILED DESCRIPTION OF THE INVENTION 

[0038] In this section, an embodiment of the present invention is described. This embodiment is a pitch extractor for 
15 16 kHz sampled speech or audio signals (collectively referred to herein as an audio signal). The pitch extractor extracts 
a pitch period of the audio signal once a frame of the audio signal, where each frame is 5 ms long, or 80 samples. 
Thus, the pitch extractor operates in a repetitive manner to extract successive pitch periods over time. For example, 
the pitch extractor extracts a previous or past pitch period, a current pitch period, then a future pitch period, corre- 
sponding to past, current and future audio signal frames, respectively. 
20 [0039] To reduce computational complexity, the pitch extractor uses 8:1 decimation to decimate the input audio signal 
to a sampling rate of only 2 kHz. All parameter values are. provided just as examples. With proper adjustments or 
retuning of the parameter values, the same pitch extractor scheme can be used to extract the pitch period from input 
audio signals of other sampling rates or with different decimation factors. 

[0040] Note that the sounds of many musical instruments, such as horn and trumpet, also have waveforms that 
25 appear locally periodic with a well-defined pitch period. The present invention can also be used to extract the pitch 
period of such solo musical instrument, as long as the pitch period is within the range set by the pitch extractor. For 
convenience, the following description uses "speech" to refer to either speech or audio. 

[0041] FIG. 1 is a high-level block diagram of an example pitch extractor system 5 in which embodiments of the 
present invention may operate. Depicted in FIG. 1 are enumerated signal processing apparatus blocks 10-50. It is to 
30 be understood that blocks 1 0-50 may represent either apparatus blocks or method steps/algorithms performed by such 
apparatus blocks. The input speech signal is denoted as s(n), where n is the sample index. The input speech signal 
is passed through a weighting filter (block 10). This filter generally suppresses the spectral peaks in the spectral en- 
velope to some degree, but not completely. A good example of such a filter is the perceptual weighting filter used in 
CELP speech coders, which usually has a transfer function of 
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45 where 0 < P < a < 1 , and 

M 

A(z) = Xfl fZ -' 

50 

is the short-term prediction error filter, M is the order of the filter, and a,, /= 0, 1 , 2, .... Mare the predictor coefficients. 
[0042] The output signal of the weighting filter, denoted as sw(n), is passed through a fixed low-pass filter block 20, 
which has a -3 dB cut off frequency at about 800 Hz. A 4 th -order elliptic filter is used for this purpose. The transfer 
55 function of this low-pass filter is 
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", P ,U)= 



0 0322952 - 0.1028824 z 1 + 0.1446838 z' 2 - 0.1028824 z 3 j- 0.0322952 z 
1 - 3.5602306 z' 1 + 4.8558478 z 2 - 2.9988298 z' 3 + 0.7069277 z' 4 



-4 



[0043] Block 30 down-samples the low-pass filtered signal to a sampling rate of 2 kHz. This represents an 8:1 dec- 
imation. In other words, the decimation factor D is 8. The output signal of the decimation block 30 is denoted as swd(n). 

Block 40 

Initial Processing 

[0044] The first-stage coarse pitch period search block 40 then uses the decimated 2 kHz sampled signal swd(n) to 
find a "coarse pitch period" , denoted as cpp in FIG. 1 . The time lag represented by cpp is in terms of number of samples 
in the 2 kHz down-sampled signal swd(n). FIG. 2 is a flow chart of an example method 200 representing the signal 
processing, that is, method steps or algorithms, used in block 40. These algorithms are described in detail below. 
[0045] Block 40 uses a pitch analysis window of 15 ms. The end of the pitch analysis window is lined up with the 
end of the current frame of the speech or audio signal. At a sampling rate of 2 kHz, 1 5 ms correspond to 30 samples. 
Without loss of generality, let the index range of n = 1 to n = 30 correspond to the pitch analysis window for swd(n). In 
an initial step 202, block 40 calculates the following correlation and energy values 



for all integers from k = MINPPD - 1 to k = MAXPPD + 1 , where MINPPD and MAXPPD are the minimum and maximum 
pitch period in the decimated domain , respectively. Example values for a wideband coder are MINPPD = 1 sample and 
MAXPPD = 33 samples. 

[0046] In a next step 204, block 40 then searches through the range of k = MINPPD, MINPPD + 1 , MINPPD + 2, .... 
MAXPPD to find all local peaks of the array {c 2 (/c)/E(/c)J for which c(k) > 0. A local peak is a member of the array { c 2 
{k)/E{k)} that has a greater magnitude than its nearest neighbors in the array (e.g., left and right members). For example, 
consider members of the array {c 2 ^/ E(k)} corresponding to successive time lags k p k 2 and k 3 . If the member corre- 
sponding to time lag k 2 is greater than the neighboring members at time lags k 1 and k 3 , then the member at time lag 
k 2 is a local peak in the array { c^(fc)/E(/c)}. 

[0047] Let N p denote the number of such positive local peaks. Let k p (j) t j =1 , 2, .... N p be the indices where c?(k p (j)) 
/E{kp(j)) is a local peak and c(k p (j)) >0, and let /c p (1)<* p (2)<... <k p (N p ). For convenience, the term c?(k)/E{K) will be 
referred to as the "normalized correlation square" (NCS) or NCS signal. Signals c(/c), c?(k) t and c*(k)lE{k) represent 
and are referred to herein as "correlation-based" signals because they are derived from the audio signal using a cor- 
relation operation, or include a correlation signal term (e.g., c(k)). A signal "peak" (such as a local peak in the array c 2 
(k)/E(k), for example) inherently has a magnitude or value associated with it, and thus, the term "peak" is used herein 
to identify the peak being discussed, and in some contexts to mean the "peak magnitude" or "peak value" associated 
with the peak. For example, in the description below, if it is stated that peaks are being compared to one another or 
against peak thresholds, this means the magnitudes or values of the peaks are being compared to one another or 
against the peak thresholds. Also, each audio signal frame corresponds to a frame of the correlation-based signal, 
where a correlation-based signal frame includes correlation-based signal values corresponding to time lags k = 
MINPPD - 1 to k = MAXPPD + 1 for example. 

[0048] Steps 202 and 204 of block 40 produce various results, as described above and indicated in FIG. 2. These 
results are considered known or predetermined for purposes of their further use in subsequent methods. FIG. 3 is an 
example Table 300 of these results. Results Table 300 may be stored in a memory, such as a RAM, for example. Table 
300 includes a first or top row of y-values 1 , 2,...A/ p (302). Each /-value identifies or corresponds to a separate column 
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of Table 300. The second row of Table 300 includes correlation square values 304 corresponding to /-values 302. The 
third row of Table 300 includes energy values 306 corresponding to respective ones of the /-values 302 and the cor- 
relation square values 304. Correlation square values 304 and energy values 306 together represent NCS local peaks 
308. More specifically, each one of NCS local peaks 308 is represented as a ratio of one of correlation square values 
304 to its corresponding one of energy values 306. A fourth or bottom row of Table 300 includes time lags (k p ) 310 
corresponding to NCS local peaks 308. 

[0049] FIG. 4 is a plot of NCS magnitude (Y-axis) against time lag (X-axis) for an example NCS signal 400. NCS 
signal 400 includes NCS signal values 402 (represented as the ratios of correlation square values to energy values) 
spaced-apart in time from one another along the time lag axis. NCS signal 400 includes NCS local peaks 308, mentioned 
above in connection with Table 300 of FIG. 3. 

[0050] Returning to the process depicted in FIG. 2, if N p = 0 (step 206), the output coarse pitch period is set to cpp 
= MINPPD (step 208), and the processing of block 40 is terminated. If N p = 1 (step 210), block 40 output is set to cpp 
= /c p (1 ) (step 21 2), and the processing of block 40 is terminated. 

[0051] If there are two or more local peaks {N p > 2) (as determined at step 210), then block 40 uses Algorithms A1 , 
A2, A3, and A4 (each of which is described below), in that order, to determine the output coarse pitch period cpp. 
Results, such as variables, calculated in the earlier algorithms will be carried over and used in the later algorithms. 
Algorithms A 1, A2, A3, and A4 operate repeatedly, for example, on a frame-by-frame basis, to extract successive pitch 
periods of the audio signal corresponding to successive frames thereof. 

Algorithms 

[0052] Explanatory comments related to the Algorithms A1-A4 described below are enclosed in brackets "{}." 
Algorithm A1 (Step 214) 

[0053] Block 40 first uses Algorithm A1 (step 21 4) below to identify the largest quadratically interpolated peak around 
local peaks of the normalized correlation square c(k p ) 2 /E(k p ). Quadratic interpolation is performed for c(/c p ), while linear 
interpolation is performed for E(k p ). Such interpolation is performed with the time resolution for the sampling rate of 
the input speech, which is 16 kHz in the illustrative embodiment of the present invention. In the algorithm below, D 
denotes the decimation factor used when decimating sw(n) to swd{n). Therefore, D = 8. 
[0054] Algorithm A1 Find largest quadratically interpolated peak around c(k p ) 2 IE(k p ): 

{At the end of Algorithm A1, c2max/Emaxw\\\ have been updated to represent a global interpolated maximum NCS 
peak] 

(i) Set c2max = -1 and set Emax = 1. 
{For each of the N p local peaks, do) 

(ii) For; =1,2, .... N p , do the following 12 steps: 

{a and b are coefficients used to calculate quadratically interpolated correlation values ci in step 7 or 8, below) 

1 . Set a = 0.5 [c(/c p (/)+1)+c(lc p (/)-1)]-c(/c p 0)) 

2. Set b= 0.5 [c(^ p 0)+1)-c(/c p 0)-1)] 

3. Set ji= 0 

{ei represents a linearly interpolated energy value, however, other interpolation techniques may be used to 
produce the interpolated energy value, such as quadratic techniques, and so on. Note: '7' denotes an inter- 
mediate value. } 

4. Set ei=E(k p {j)) 

{c2m represents a quadratically interpolated correlation square value. Note: "m" denotes a maximum value.} 

5. Set c2m=c?(k p U)) 

6. Set Em=E(k p (j)) 

{Step 7 uses a cross-multiply compare operation to determine if right-side adjacent NCS value c^/c^+iyE 
(* P W +1 ) > left " slde adjacent NCS value c 2 (k p (j)-'\)/E{k p (j)^). If this is the case, then the interpolated NCS 
peak resides between time lags k p (j) and k p {j) + 1 , and the remainder of step 7 generates interpolated NCS 
values between these time lags, and selects a maximum one of these interpolated NCS values as an inter- 
polated NCS peak corresponding to the local peak being processed. The ratio of correlation square to energy 
representing the NCS signal is not actually calculated, as seen below } 

7. If c 2 (fr p 0)+1)E(^ p (/)-1)>c^(/f p 0>1)E(/c p (/H1), do the remaining part of step 7: 
{ Calculate linearly interpolated energy increment ) 
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A =[E(k p (j) + 1)-e,yD 

{For a plurality of interpolated time lags between k p (j) and k p (j) + 1 . do. Note that 'Vbelow is an integer counter 
indicative of interpolated time lags, and is not to be confused with time lag or index "/c" above used with c(k) : 
and so on.} 

For k = 1,2, D/2 : do the following indented part of step 7: 

{ Calculate quadratically interpolated correlation value ci at interpolated time lag k/D] 

ci = a (k/D) 2 + b (k/D)+c(k p (j)) 

{Calculate linearly interpolated energy value corresponding to interpolated correlation value ci } 
Update ei as ei + A 

{Compare the current interpolated NCS value (ciflei to a current maximum NCS interpolated value (i. 
e., Em/c2m), to see which is larger. Use a cross-multiply compare operation to avoid actually calculating the 
ratios (ci) 2 /ei and Em/c2m. If the current NCS value is larger, then this current interpolated NCS value also 
becomes the current maximum NCS interpolated value.} 

If (ci) 2 Em > (c2m) ei, do the next three indented lines: 

// = * 



c2m = (ci) 2 



Em = ei 

{Step 8 is similar to step 7, except first check to see if the interpolated NCS peak resides between time lags 
k p (j) and k p {j) - 1 , and if so, then generate interpolated NCS values between these time lags} 
8. If c 2 (^(y)+1)^(^(y)-1)^c 2 (V(/> 1 ) E ( /f pW +1 ). doth e remaining part of step 8: 

A=[E(yy)-1)-e/T/D 

For k = -1 , -2, .... -D/2, do the following indented part of step 8: 

ci=a (k/D) 2 +b (k/D)+c(k p (j)) 

Update ei as e, + A 

If (ci) 2 Em> (c2m) ei, do the next three indented lines: 

ji = k 



c2m = (ci) 2 



Em = ei 

{After step 7 or step 8, c2m/Em is the interpolated NCS peak at interpolated time lag (/) (see below). This 
interpolated NCS peak corresponds to local NCS peak c*(k (j))/E(k p (j)) at time lag k D (j).} 

9. Set tag(j)=k p (j)+jVD 

10. Set c2/(y) = c2m 

11. Set E/(/)=E/n 
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{Step 12 compares the current NCS interpolated peak (c2i(j)l Ei(j) , represented as c2m/Em) selected in either 
step 7 or step 8 to a current global maximum interpolated NCS peak c2max/Emax to see which is larger using 
a cross-multiply compare operation. If the current NCS interpolated peak is larger, then it becomes the current 
global maximum interpolated NCS peak.} 

12. If c2m xEmax > c2max x Em : do the following three indented lines: 

jmax = j 



c2max = c2m 



Emax = Em 

{At this point, c2maxl Emax \s the global maximum interpolated NCS peak, and jmax is the j-value identifying 
the corresponding interpolated NCS peak c2i(j)/Ei(j), i.e., c2i(jmax)/Ei(jmax). Step (iii) sets cpp= the time lag 
of the local peak corresponding to the global maximum interpolated NCS peak. This local peak is the global 
maximum local NCS peak} 

(iii) Set the first candidate for coarse pitch period as cpp = k p (jmax). 

End Algorithm A 1 

[0055] As described above, initial steps 202 and 204 of block 200 produce results stored in Results Table 300. 
Algorithm A1 produces further results, that may also be stored in a tabular format. FIG. 5 is an example Table 500 
including such further result produced by Algorithm A1. Table 500 includes the rows of Table 300, plus a fifth row 
including interpolated correlation square values 502 produced in either Algorithm A1, step 7 or Algorithm A1, step 8. 
Table 500 includes a sixth row including interpolated energy values 504 also produced in either step 7 or step 8 of 
Algorithm A1 . The ratios of the interpolated correlation square values 502 to corresponding ones of interpolated energy 
values 504 correspond to interpolated NCS peaks 506, returned at steps 10 and 11 of Algorithm A1. A seventh or 
bottom row of Table 500 includes interpolated lags 510 {denoted lag (/-value)), produced at Algorithm A1, step 9. 
[0056] As described above, Algorithm A 1 searches for, inter alia, a maximum interpolated NCS peak among inter- 
polated NCS peaks 506 (referred to as the global maximum interpolated NCS peak c2maxlEmax) and its corresponding 
interpolated time lag, lag (j=jmax). For example, Algorithm A 1 may return interpolated NCS peak 512 (encircled by a 
dashed line in FIG. 5) as the global maximum interpolated NCS peak (NCS peak c2max/Emax), having a corresponding 
interpolated time lag 514 (\agij=jmax)). Interpolated NCS peak 512 and interpolated time lag 514 correspond to global 
maximum NCS local peak 516 and its corresponding time lag 518. 

[0057] FIG. 6 is a plot of NCS magnitude against time lag for the example NCS signal 400, similar to the plot of FIG. 
4, except the plot of FIG. 6 includes a series of interpolated NCS values 604 near each of NCS local peaks 308. Also 
illustrated in FIG. 6 are interpolated NCS peaks 506. Each of interpolated peaks 506 is near a corresponding one of 
local peaks 308. 

[0058] FIG. 7 is a flowchart of an example method 700 corresponding generally to Algorithm A1. A first step 702 
corresponds to Algorithm A1, step (ii). Step 702 includes identifying an initial one of NCS local peaks 308 (e.g., local 
peak 308a) for which a corresponding interpolated NCS peak (e.g., interpolated NCS peak 506a) is to be found. A next 
step 704 corresponds generally to either of Algorithm A1, step 7 or step 8. Step 704 includes further steps 706, 708, 
710 and 712. 

[0059] Step 706 includes determining whether to interpolate between the time lag of the identified (that is, currently- 
being-processed) local peak and either an adjacent earlier time lag or an adjacent later time lag. This corresponds to 
the beginning "if test" of either Algorithm A 1, step 7 or Algorithm A1 , step 8. 

[0060] Step 708 includes producing quadratically interpolated correlation values (e.g., values ci) and their corre- 
sponding interpolated correlation square values (e.g., cr 2 ). 

[0061] Step 710 includes producing interpolated energy values (e.g., ei), each of the energy values corresponding 
to a respective one of the correlation square values (e.g. , cP). The individual ratios of the interpolated correlation square 
values (e.g., c£) to their corresponding interpolated energy values (e.g., ei), represent interpolated NCS signal values 
(e.g., the ratios represent interpolated NCS signal values 604a (cP/ei), in FIG. 6). 

[0062] Step 712 includes selecting a largest interpolated NCS signal value (e.g., interpolated NCS peak 506a) among 
the interpolated NCS values (e.g., among interpolated NCS values 604a). Step 712 includes performing cross-multiply 
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compare operations between different interpolated NCS values in each group of interpolated NCS values (e.g... in the 
group of interpolated NCS values 604a). In this manner the ratio representing the interpolated NCS peak 506a need 
not be evaluated or computed. 

[0063] A next step 714 includes determining if further local peaks among local peaks 308 are to be processed. If 
further local peaks are to be processed, then a next local peak is identified at step 715, and step 704 is repeated for 
the next local peak. If all of local peaks 308 have been processed, flow control proceeds to step 716. 
[0064] Upon entering step 71 6 : interpolated NCS peaks 506 corresponding to each of NCS local peaks 308 have 
been selected, along with their corresponding interpolated time lags 510. Step 716 includes selecting a largest inter- 
polated NCS peak (for example, interpolated NCS peak 512 in Table 5) among interpolated NCS peaks 506. Step 716 
performs this selection using cross-multiply compare operations between different ones of interpolated NCS peaks 
506 so as to avoid actually calculating any NCS ratios. 

[0065] Step 718 includes returning the time lag (e.g., 518) of the local peak (e.-g. t 516) corresponding to the largest 
interpolated NCS peak (e.g., peak 512), selected in step 716, as a candidate coarse pitch period (e.g., cpp) of the 
audio signal. The term "returning" means setting the variable cpp equal to the just-mentioned time lag. 

Algorithm A2 (Step 216) 



[0066] To avoid picking a coarse pitch period that is around an integer multiple of the true coarse pitch period, Algo- 
rithm A2 (step 214) performs a search through the time lags corresponding to the local peaks of c(k p ) 2 /E{k p ) to see if 
any of such time lags is close enough to the output coarse pitch period of block 40 in the last frame of the correlation- 
based signal (that corresponds to the last frame of the audio signal), denoted as cpplast. If a time lag is within 25% of 
cpptast, it is considered close enough. For all such time lags within 25% of cpplast, the corresponding quadratically 
interpolated peak values of the normalized correlation square c(k p ) 2 /E(k p ) are compared, and the interpolated time lag 
(e.g., time lag lag(im) from Algorithm A2 below) corresponding to the maximum normalized correlation square (e.g., 
c2m!Em = c2i{im)/Ei{im) from Algorithm A2 below) is selected for further consideration. 

Algorithm A2 below performs the task described above. The interpolated arrays c2i(f) and Ei(j) calculated in Algorithm 
A 1 above (see Results Table 5) are used in this algorithm. 

[0067] Algorithm A2 Find the time lag maximizing interpolated c(k p ) 2 /E{k p ) among all time lags close to the output 
coarse pitch period of the last frame: 

(i) Set index im - -1 

(ii) Set c2m = -1 

(iii) Set Em = 1 

{For each of time lags k p (j) 310, do) 

(iv) For / =1,2, .... N p> do the following: 

{If the currently-being-processed time lag k p (j) is within a predetermined time lag range, that is, near, the previously 

determined pitch period cpplast, then do} 

lf \k p U)-cpptasft<0.25x cpplast, do the following: 

{If the interpolated NCS peak corresponding to (that is, next to) the currently-being-processed local peak 
near cpplast > a current maximum interpolated NCS peak near cpplast, then set the currently-being-processed 
interpolated NCS peak to the current maximum. This step includes performing the comparison c2i(j)/Ei(j) > c2ml 
Em using a cross-multiply compare operation.) 

If c2i(j)xEm > c2m xEi{j), do the following three lines: 



im =j 



50 



c2m - c2i(j) 



Em = Ei(j) 



End Algorithm A2 

55 

[0068] Note that if there is no time lag k p (j) within 25% of cpplast, then the value of the index im will remain at -1 
after Algorithm A2 is performed. If there are one or more time lags within 25% of cpplast, the index im corresponds to 
the largest normalized correlation square among such time lags. 
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[0069] FIG. 8 is a flowchart of an example method 800 corresponding generally to Algorithm A2. A first step 802 
includes determining if any time lags among time lags 310 are near previously determined pitch period cpplast. Pitch 
period cpplast was determined for a previous frame of the audio signal. 

[0070] A next step 804 includes comparing the interpolated NCS peaks corresponding to those time lags determined 
5 to be near previously determined pitch period cpplast from step 802. Step 804 includes comparing the interpolated 
peaks to one another using cross-multiply compare operations. 

[0071] A next step 806 includes selecting the interpolated time lag corresponding to a largest interpolated peak 
among the compared interpolated peaks from step 804. 

w Algorithm A3 (Step 21 8) 

[0072] Next, Algorithm A3 (step 21 8) of block 40 determines whether an alternative time lag in the first half of the 
pitch range should be chosen as the output coarse pitch period. Basically, Algorithm A3 searches through ait interpo- 
lated time lags lagij) that are less than a predetermined time lag, such as 16, and checks whether any of them has a 

'5 large enough local peak of normalized correlation square near every integer multiple of it (including itself) up to twice 
the predetermined time lag, such as 32. If there are one or more such time lags satisfying this condition, the smallest 
of such qualified time lags is chosen as the output coarse pitch period of block 40. This search technique for pitch 
period extraction is referred to herein as "pitch extraction using multiple time lag extraction" because of the use of the 
integer multiples of identified time lags. 

20 [0073] Again, variables calculated in Algorithms A 1 and A2 above carry their final values over to Algorithm A3 below. 
In the following, the parameter MPDTH is 0.06, and the threshold array MPTH{k) is given as MPTH(2) = 0.7, MPTH 
(3) = 0.55, MPTH(4) = 0.48, MPTH(5) = 0.37, and MPTH(k) = 0.30, for k > 5, where MPTH stands for Multiple Pitch 
Period Threshold. 

[0074] Algorithm A3 Check whether an alternative time lag in the first half of the range of the coarse pitch period 
25 should be chosen as the output coarse pitch period: 

{Outer loop: Process each time lag separately, and in an order of increasing time lag beginning with the smallest time 
lag.} 

For /= 1 , 2, 3, in that order do the following while lagfj) < 16; 

{If the currently-being-processed time lag is not the time lag (lag(/m)) near the previously determined pitch period 
30 cpplast (determined in Algorithm A2), then set a higher peak threshold to overcome. In other words, Algorithm A3 
favors the time lag selected in Algorithm A2 near the previously determined pitch period cpplast, when it exists, over 
other time lags.} 

(i) If y * im, set threshold = 0.73; otherwise, set threshold « 0.4. 

35 { Step (ii) below determines if the currently-being-processed time lag qualifies for further testing. Step (ii) 

includes determining if the peak corresponding to the currently-being-processed time lag exceeds a threshold 
based on the threshold set in step (i). If yes (the time lag is qualified), then go on to step (iii) a), below. If no, 
continue to process/examine the next time lag and its corresponding peak. 

(ii) If c2iij) x Emax < threshold x c2max x £/(/), disqualify this /, skip step (iii) for this /, increment j by 1 and go 
40 back to step (i). 

{ If the time lag/peak qualified, then begin at step (iii) a) below } 

(iii) If c2i{j) x Emax > threshold x c2max x £/(/), do the following: 

{Set up an individual time window coinciding with each one of integer multiples of the time lag (e.g., a first 
time window coinciding with 2 x lag(y), a second time window coinciding with 3 x lag{/), and so on). Each time 
45 window extends between a lower bound a and an upper bound b. Then determine if there exists a respective, 

sufficiently large peak near each of the integer multiples of lag(/), that is, having a time lag falling within the time 
window}. For example, determine if there is (i) a first sufficiently large peak within a first predetermined time range 
(i.e., first time window) of 2 x lag(/), (ii) a second sufficiently large peak within a second predetermined time range 
(i.e., a second time window) of 3 x lag(/), and so on. 

50 

a) For k = 2, 3, 4, .... do the following while k x lag(j) <32: 
1,s = lfx lagtj) 
55 2.a = ^-MPDTH)s 

3. 6= (1 4- MPDTH) s 
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4. Go through m = y+1 , y'+2 : y+3 N p , in that order and see if any of the time lags lag{m) is between a 

and b. If none of them is between a and b t disqualify this /. stop step (iii) : increment /by 1 and go back to 
step (i). If there is at least one such m that satisfies a < lag{m) • b and c2i(m) ? Emax > MPTH(k) x c2max 
>' Ei(m) : then it is considered that a large enough peak of the normalized correlation square is found in 
the neighborhood of the fr-th integer multiple of lag{j): in this case : stop step (iii) a) 4., increment k by 1 s 
and go back to step (iii) a) 1 . 

b) If step (iii) a) is completed without stopping prematurely, that is., if there is a large enough interpolated peak 
of the normalized correlation square within ±100xMPD7Wy o of every integer multiple of lag(j) that is less than 
32, then stop this algorithm and stop the operation of block 40 : and set cpp = k p {j) as the final output coarse 
pitch period of block 40. 

End Algorithm A3 

[0075] FIG. 9 is a flowchart of an example method 900 corresponding generally to Algorithm A3. Method 900 proc- 
esses each of interpolated time lags, lag (j), individually and in an order of increasing time lag beginning with the 
smallest time lag, as identified in a step 902. 

[0076] A next step 904 includes setting a threshold or weight depending on whether the identified interpolated time 
lag (that is, the time lag currently-being-processed) is the time lag : lag(/>7?), determined in Algorithm A2. Step 904 
corresponds to Algorithm A3, step (i). 

[0077] A next step 906 includes determining if the identified interpolated time lag qualifies for further testing. This 
includes determining if the interpolated peak corresponding to the identified time lag is sufficiently large ; that is, exceeds, 
a threshold based on the weight set in step 904 and the global maximum interpolated NCS peak 512. Step 906 cor- 
responds to Algorithm A3, step (ii). 

[0078] It the identified interpolated time lag qualifies for further testing., then flow proceeds to step 908. Step 908 
includes determining if there is an interpolated time lag among interpolated time lags 510 that 

(i) is sufficiently near a respective one of one or more integer multiples of the identified interpolated time lag, and 

(ii) corresponds to an interpolated NCS peak exceeding a peak threshold. For the determination of step 908 to 
pass (that is, to evaluate as "True"), each of the above-listed test conditions (i) and (ii) of step 908 must be satisfied 
for each of the integer multiples k. Step 908 corresponds to Algorithm A3, steps a) 1 a)2. : a)3., and portions of 
step a)4. 

[0079] A next step 910 tests whether the determination of step 908 passed. If the determination of step 908 passed, 
then flow proceeds to a step 912. Step 912 includes setting the pitch period to the time lag k (j) corresponding to the 
identified interpolated time lag, lag(/). Step 912 corresponds to Algorithm A3, step (iii)b). 

[0080] Returning to step 906, if the identified interpolated lag does not qualify for further testing, then flow proceeds 
to a step 914. Similarly, if the determination in step 908 failed, then flow also proceeds to step 914. 
[0081] Step 91 4 includes determining whether a desired number, which may be all, of the interpolated time lags have 
been tested or searched by Algorithm A3. If the desired number of interpolated time lags have been tested or searched, 
then Algorithm A3 ends. Conversely, if further time lags are to be searched, then the next time lag is identified at step 
920, and flow proceeds back to step 904. 

[0082] FIG. 10 is an example plot of correlation-based magnitude (such as NCS magnitude, for example) against 
time lag, which serves as a useful illustration of portions of Algorithm A3. Assume step 902 or 920 identifies a time lag 
1002a (lag(/)) to be tested, where the time lag corresponds to a peak 1002. Assume Algorithm A3, steps (iii)a)1 .-(iii) 
a)3., generate successive time windows 1004, 1006 and 1008 coinciding with respective successive time lags: 2 x 
lag (/); 3 x lag (/); and 4 x lag (j), where the multipliers 2, 3 and 4 are representative of an integer multiplier or counter k 
[0083] Also assume Algorithm A3, step (iii)a)4. uses, or generates and uses successive peak thresholds 1010,1012 
and 1014 corresponding to respective time windows 1004, 1006 and 1008, according to threshold function MPTH(k) 
x c2maxIEmax. Thus, peak thresholds 1010-1014 are a function of the identified time lag multiple k. 
[0084] For step 908 to pass, there must exist peaks and their corresponding time lags (among the peaks and time 
lags of Tables 3 and 5, for example) that meet both conditions (i) and (ii) of step 908. For example, assume there exist 
peaks 1020, 1022 and 1024 corresponding to respective time lags 1020a, 1022a and 1024a, that fall within respective 
time windows 1004, 1006, and 1008. Thus, in the scenario depicted in FIG. 10, the first condition (i) of step 908 is 
satisfied. Note that if one or more of the time windows did not coincide with a respective time lag, then condition (i) of 
step 908 would not be satisfied, and the determination of step 908 would fail. 

[0085] For step 908 to pass, condition (ii) must also be satisfied. That is, each of peaks 1020, 1022 and 1024 must 
be sufficiently large, that is, must exceed its respective one of peak thresholds 1010, 1012 and 1014. As seen in FIG. 
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10, peak 1024 falls below its respective peak threshold 1014. Thus, condition (ii) of step 908 is not satisfied, and the 
determination of step 908 fails. On the other hand., if peak 1024 were above its respective peak threshold 1014 : then 
there would be a sufficiently large peak sufficiently near each integer multiple of identified \ag(j) ! and both conditions 
(i) and (ii) of step 908 would be met ; that is : the determination of step 908 would pass (i.e. ; evaluate to "True"). 

Algorithm A4 (Step 220) 

[0086] If Algorithm A3 above is completed without finding a qualified output coarse pitch period cpp, then block 40 
examines the largest local peak of the normalized correlation square around the coarse pitch period of the last frame, 
found in Algorithm A2 above, and makes a final decision on the output coarse pitch period cpp using Algorithm A4 
(step 220) below. Again, variables calculated in Algorithms A 1 and A2 above carry their final values over to Algorithm 
A4 below. In the following, the parameters are SMDTH = 0.095 and LPTH1 = 0.78. 

Algorithm A4 Final decision of the output coarse pitch period: 

[0087] 

(i) If im = -1 , that is, if there is no large enough local peak of the normalized correlation square around the coarse 
pitch period of the last frame, then use the cpp calculated at the end of Algorithm A1 as the final output coarse 
pitch period of block 40, and exit this algorithm. 

(ii) If im = jmax, that is, if the largest local peak of the normalized correlation square around the coarse pitch period 
of the last frame is also the global maximum of all interpolated peaks of the normalized correlation square within 
this frame : then use the cpp calculated at the end of Algorithm A1 as the final output coarse pitch period of block 
40, and exit this algorithm. 

(iii) If im < jmax, do the following indented part: 

If c2m x Emax > 0.43 x c2max x Em, do the following indented part of step (iii): 

a) If lag{im) > MAXPPD/2, set block 40 output cpp = k p (im) and exit this algorithm. 

b) Otherwise, for k = 2, 3, 4, 5, do the following indented part: 

1 . s = lagijmax) I k 

2. a = (1 - SMDTH)s 

3. b = (1 + SMDTH) s 

4. If lag(im) > a and fag(im) < b, set block 40 output cpp - k p (im) and exit this algorithm. 

(iv) If im > jmax, do the following indented part: 

If c2m x Emax > LPTH1 x c2max x Em, set block 40 output cpp = k p (im) and exit this algorithm. 

(v) If algorithm execution proceeds to here, none of the steps above have selected a final output coarse pitch 
period. In this case, just accept the cpp calculated at the end of Algorithm A 1 as the final output coarse pitch period 
of block 40. 

End Algorithm A4 

[0088] FIGs. 11 A and 11 B are flowcharts that collectively represent an example method 1100 corresponding to Al- 
gorithm A4. A first step 1102 includes receiving, accessing or retrieving a candidate local peak (CLP) indicator, such 
as indicator im produced in Algorithm A2. As described above Algorithm A2 searches for a sufficiently large local peak 
positioned near (that is, within a predetermined time lag range of) a previously determined pitch period of the audio 
signal. Such a peak, when found, is referred to as a candidate local peak (CLP). Algorithm A2 returns a CLP indicator 
(e.g., variable im) indicating whether a CLP was found. The CLP indicator (e.g., variable im) has either: 

(i) a first indicator value indicating a CLP exists (e.g., im = a valid time lag or time lag index corresponding to a 
found CLP); or 

(ii) a second indicator value indicating that no CLP exists (e.g., im = an invalid time lag or time lag index, such as 
"•I"). The first and second CLP indicator values are equivalently referred to herein as first and second CLP indi- 
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cators : respectively. 

[0089] A next step 1104 includes determining which of the first and second CLP indicators (e.g. : indicator values) 
was received in step 1 1 02. If the second CLP indicator was received, then a step 1 1 06 includes setting the pitch period 
equal to the time lag corresponding to the global maximum local peak. Steps 1104 and 1106 correspond to Algorithm 
A4, step (i). 

[0090] If the first CLP indicator was received in step 1102, then a next step 1108 includes determining if the CLP is 
the same as the global maximum local peak. If this is the case, then a step 1 1 09 includes setting the pitch period equal 
to the time lag corresponding to the global maximum local peak. Steps 1108 and 1109 correspond to Algorithm A4, 
step (ii). 

[0091 ] If step 1 1 08 determines that the CLP is not the same as the global maximum local peak, then flow proceeds 
to a next step 1110 (FIG. 11 B). Step 1110 includes determining if the time lag corresponding to the CLP is less than 
the time lag corresponding to the global maximum local peak. If the determination of step 1 1 1 0 is true, then a next step 
1112 includes determining if the CLP exceeds a peak threshold PKTH 2 (where PKTH 2 = .43x c2max/Emax, in Algorithm 
A4, step (iii)). If the CLP exceeds the peak threshold, then a next step 1114 includes determining if the time lag of the 
CLP is greater than a predetermined pitch period search range (Algorithm A4, step (iii)a)). If the determination of step 
1 1 14 is false, then a next step 1116 includes determining if the time lag corresponding to the CLP is near (that is, within 
a predetermined range of) at least one integer sub-multiple of the time lag corresponding to the global maximum local 
peak (Algorithm A4, step (iii)b)). If the determination of step 1116 returns True (i.e., passes), then a next step 1118 
includes setting the pitch period equal to the time lag of the CLP (Algorithm A4 t step (iii)b)). 

[0092] Returning to step 1110, if the time lag corresponding to the CLP is not less than the time lag corresponding 
to the global maximum local peak, then flow proceeds to a step 1122. Step 1122 includes determining if the CLP 
exceeds a peak threshold PKTH 3 (where PKTH 3 = LPTH1 x c2maxlEmax, in Algorithm A4, step (iv)). If the determi- 
nation of step 1122 is false, then flow proceeds to a step V. If the determination of step 1122 is true, then a next step 
1124 includes setting the pitch period equal to the time lag corresponding to the CLP. 
[0093] Returning to step 1112, if the determination of step 1112 is false, the flow proceeds to step V. 
[0094] Returning to step 1114, if the determination of step 1114 is true, then flow proceeds to a next step 1126. At 
step 1126, the pitch period is said equal to the time lag corresponding to the CLP. 

[0095] Step V includes a step 1 1 30. Step 1 1 30 includes setting the pitch period equal to the time lag corresponding 
to the global maximum local peak. Referring to FIG. 11B, steps 1110, 1112, 1114, 1116, 1118 and 1126 correspond 
generally to Algorithm A4, step (iii). Steps 1122 and 1124 correspond generally to Algorithm A4, step (iv). Also, step 
1 1 30 corresponds to Algorithm A4, step (v) . 

[0096] FIG. 1 1 C is a plot of correlation-based magnitude against time lag which serves as an illustration of Algorithm 
A4, step (iii)b), and similarly, step 1116 of method 1100. Algorithm A4, step (iii)b) determines whether the time lag of 
the CLP (lag(/m)) coincides with, that is, falls within, any of time lag ranges 1 1 50, 1 1 52, 1 1 54 and 1 1 56, centered around 
respective time lags lag(ymax)/2, \ag(jmax)/3, \ag(jmax)IA and lag(/max)/5, where \ag(jmax) is the time lag of the global 
maximum peak of the correlation-based signal. If the time lag of the CLP does fall within any of these ranges, then the 
time lag is returned as the pitch period, assuming the time lag < MAXPPD/2 (step 1114) and the CLP > PKTH 2 (step 
1112). Embodiments of the present invention include omitting steps 11 1 2 and 1 1 1 4, which reduces computational com- 
plexity, but may also reduce the accuracy of a determined pitch period. 

Block 50 



[0097] Block 50 takes cpp as its input and performs a second-stage pitch period search in the undecimated signal 
domain to get a refined pitch period pp. Block 50 first converts the coarse pitch period cpp to the undecimated signal 
domain by multiplying it by the decimation factor D, where D - 8 for 16 kHz sampling rate. Then, it determines a search 
range for the refined pitch period around the value cpp x D. Let MINPPand MAXPP be the minimum and maximum 
allowed pitch period in the undecimated signal domain, respectively. Then, the lower bound of the search range is lb 
= max{MINPP, cpp x D-D + 1), and the upper bound of the search range is ub = m\n{MAXPP, cpp x D + D- 1) In 
this embodiment, MINPP = 1 0 and MAXPP = 265. 

[0098] Block 50 maintains an input speech signal buffer with a total of MAXPP + 1 + FRSZ samples, where FRSZ 
is the frame size, which is 80 samples for in this embodiment. The last FRSZ samples of this buffer are populated with 
the input speech signal s(n) in the current frame. The first MAXPP + 1 samples are populated with the MAXPP + 1 
samples of input speech signal s(n) immediately preceding the current frame. Again, without loss of generality, let the 
index range from n = 1 to n = FRSZ denotes the samples in the current frame. 

[0099] After the lower bound lb and upper bound ub of the pitch period search range are determined, block 50 
calculates the following correlation and energy terms in the undecimated s(n) signal domain for time lags that are within 
the search range [lb, ub]. 
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FRSZ 



FRSZ 



[0100] The time lag *e[/b,u£>] that maximizes the ratio v 2 (k)lE(k) is chosen as the final refined pitch period. That i: 

c\k) 
E(k)_ 



[0101] This completes the description of this embodiment of the present invention. 
Generalized and Alternative Embodiments 

[0102] FIG. 12 is a flowchart of a generalized method 1200, according to embodiments of the present invention. 
Method 1200 encompasses at least portions of the methods and Algorithms described above, in addition to further 
methods of the present invention. A first step 1 204 includes deriving or generating a correlation-based signal from an 
audio signal. Step 1 204 may derive the NCS signal described above, or any other correlation-based signal., such as a 
correlation square signal that is not normalized, or that is normalized using a signal other than an energy signal. Step 
1204 may derive the correlation-based signal from a decimated audio signal, as in steps 202 and 204,' or from an audio 
signal that is not decimated. Thus, the correlation-based signal may include correlation-based signal values corre- 
sponding to decimated time lags, or to correlation-based signal values that correspond to non-decimated time lags. 
The information and results produced in step 1204 are considered known or predetermined for purposes of their further 
use in subsequent methods. 

[0103] A next step 1206 includes performing one or more of: 

(i) Algorithm A 1 or a variation thereof (collectively referred to as Algorithm A /'), to return a pitch period of the audio 
signal; 

(ii) Algorithm A2 or a variation thereof (collectively referred to as Algorithm A2) y to return a pitch period of the 
audio signal; 

(iii) Algorithm A3 or a variation thereof (collectively referred to as Algorithm A3), to return a pitch period of the 
audio signal; and 

(iv) Algorithm A4 or a variation thereof (collectively referred to as Algorithm A4) t to return a pitch period of the 
audio signal. 

[0104] For example, step 1206 may include performing only Algorithm A 1', only Algorithm A2\ only Algorithm A3', 
or only Algorithm A4'. Alternatively, step 1206 may include performing Algorithm /U'and Algorithm A3', but not Algo- 
rithms A2'and A4\ and so on. Any combination of Algorithms AV ' - 44' may be performed. Performing a lesser number 
of the Algorithms reduces computational complexity relative to performing a greater number of the Algorithms, but may 
also reduce the determined pitch period accuracy. A "variation" of any of the Algorithms A 1, A2, A3 andA4, may include 
performing only a portion, for example, only some of the steps of that Algorithm. Also, a variation may include performing 
the respective Algorithm without using decimated or interpolated correlation-based signals, as described below. 
[0105] Algorithms A 1-A4 have been described above by way of example as depending on both decimated and in- 
terpolated correlation -based signals and related variables. It is to be understood that embodiments of the present 
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invention do not require both decimated and interpolated correlation-based signals and variables. For example, Algo- 
rithms ,43'and A4' and their related methods may process or relate to either decimated or non-decimated correlation- 
based signals, and may be implemented in the absence of interpolated signals (such as in the absence of interpolated 
time lags and interpolated peaks). For example, method 900 may operate on local peaks of a non-decimated correla- 
tion-based signal, and thus in the absence of interpolated signals. 

[0106] FIG. 13 is a plot of correlation-based magnitude against time lag for a generalized correlation-based signal 
1300 (for example, as derived in step 1204 of FIG. 12). Correlation-based signal 1300 includes correlation-based 
values 1302 extending across the time lag access. Correlation-based signal 1300 includes local peaks 1304a, 1304b, 
and 1 304c for example. Correlation-based signal 1 300 includes a global maximum local peak 1 304b. Correlation-based 
signal 1300 may be a correlation square signal, an NCS signal, or any other correlation-based signal. Correlation- 
based signal 1300 may be non-decimated, or alternatively, decimated. 

[0107] FIG. 14 is a flowchart of an example method 1400 for processing a correlation-based signal, such as signal 
1300. Method 1400 corresponds generally to steps 1112, 1116 and 1118 of method 1100. 

[01 08] A first step 1 402 includes determining if a candidate peak among local peaks 1 304 in signal 1 300, for example, 
exceeds a peak threshold. 

[0109] A next step 1404 includes determining if the candidate time lag corresponding to the candidate peak is near 
at least one integer sub-multiple of the time lag corresponding to global maximum peak 1 304b (e.g. . of the signal 1 300). 
[0110] A next step 1406 includes setting a pitch period equal to the candidate time lag when the determinations of 
both steps 1402 and 1404 are true. 

[0111] This search technique for pitch period extraction is referred to herein as "pitch extraction using sub-multiple 
time lag extraction" because of the use of the integer sub-multiples of the time lag corresponding to the global maximum 
peak. 

Systems and Apparatuses 

[0112] FIG. 15 is a block diagram of an example system 1500 for performing one or more of the methods of the 
present invention. System 1 500 includes an input/output (I/O) block or module 1 502 for receiving an audio signal 1 504 
and for providing a determined pitch period (for example, cpp or pp) 1 506 to external users. System 1 500 also includes 
a correlation based signal generator 1510, a module 1512 for performing Algorithm AV and/or related methods, a 
module 1514 for performing Algorithm A2' and/or related methods, a module 1516 for performing Algorithm A3' and/ 
or related methods, and a module 1 51 8 for performing Algorithm /W'and/or related methods, all coupled to one another 
and to I/O module 1502 over or through a communication interface 1522. 

[0113] Generator 1510 generates or derives correlation-based signal results 1524, such as a correlation values, 
correlation square values, corresponding energy values, time lags, and so on, based on audio signal 1504. Module 
1512 generates results 1526, including interpolated NCS peaks 506 and corresponding lags 510, and determined 
global maximum interpolated and local peaks 506, and so on. Module 1514 generates results 1528, including a CLP 
indicator. Module 1516 produces results 1530 in accordance with Algorithm A3\ including a determined pitch period 
when one exists. Module 1518 produces results 1532 in accordance with Algorithm A4\ including a determined pitch 
period. Modules 1 502, and 1 51 0- 1 51 8 may be implemented in software, hardware, firmware or any combination thereof. 
[0114] FIG. 16 is a block diagram of an example arrangement of module 1512. Module 1512 includes a module 1602 
for producing results 1604, including Quadratically Interpolated Correlation (QIC) signal values (e.g., ci) and square 
QIC signal values (e.g., cP). For example, module 1512 performs step 708 of method 700. Module 1512 also includes 
a module 1606 for producing interpolated energy signal values 1608 (e.g., ei) corresponding to square QIC values 
included in results 1604. For example, module 1512 performs step 710 of method 700. A selector 1610, including a 
comparator 1 61 2, selects a largest interpolated NCS signal value or NCS peak (represented in results 1 604 and 1 608) 
based on cross-multiply compare operations performed by comparator 1 61 2. For example, module 1610 performs step 
712 of method 700. 

[0115] FIG. 17 is a block diagram of an example arrangement of module 1514. Module 1514 includes a determiner 
module 1 702 for determining if time lags included in results 1 524 are near a previously determined pitch period of audio 
signal 1504. For example, module 1702 performs step 802 of method 800. Module 1514 includes a comparator 1704 
for comparing interpolated peaks corresponding to the time lags determined to be near the previous pitch period (by 
module 1702). For example, module 1704 performs step 804 of method 800. Module 1514 further include a selector 
1706 to select a time lag corresponding to a largest one of the interpolated peaks compared at module 1704. For 
example, module 1 704 performs step 806 of method 800. 

[0116] FIG. 18 is an example arrangement of module 1516. Module 1516 includes further modules 1802, 1804 and 
1806. Signals and indicators flow between modules 1802-1806 as necessary to implement Algorithm A3' as embodied 
in method 900, for example. Module 1 802 performs steps 902-906 of method 900. Module 1 804 performs step 908 of 
method 900. Module 1806 performs at least steps 910 and 912 of method 900, and may also perform one or more of 
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steps 914 and 920 of method 900. 

[0117] FIG. 19 is a block diagram of an example arrangement of module 1518. Module 1518 includes further modules 
1902 : 1904, 1906 and 1908. Signals and indicators flow between modules 1902-1908 as necessary to implement 
Algorithm A4' as embodied in methods 1100 and 1400, for example. Module 1902 performs step 1402 of method 1400, 
5 or step 1112 of method 1100. Module 1904 performs step 1404 of method 1400, or step 1116 of method 1100. Module 
1906 performs step 1406 of method 1400, or step 1118 of method 1100. Module 1908 performs further conditional 
logic steps, such as steps 1110, 1112.. 1114 and/or 1122 of method 1100, for example. 

Hardware and Software Implementations 

w 

[01 18] The following description of a general purpose computer system is provided for completeness. The present 
invention can be implemented in hardware, or as a combination of software and hardware. Consequently, the invention 
may be implemented in the environment of a computer system or other processing system. An example of such a 
computer system 2000 is shown in FIG. 20. In the present invention, all of the signal processing blocks depicted in 
FIGs. 1 and 1 5-1 9, for example, can execute on one or more distinct computer systems 2000, to implement the various 
methods of the present invention. The computer system 2000 includes one or more processors, such as processor 
2004. Processor 2004 can be a special purpose or a general purpose digital signal processor. The processor 2004 is 
connected to a communication infrastructure 2006 (for example, a bus or network). Various software implementations 
are described in terms of this exemplary computer system. After reading this description, it will become apparent to a 
20 person skilled in the relevant art how to implement the invention using other computer systems and/or computer ar- 
chitectures. 

[0119] Computer system 2000 also includes a main memory 2008, preferably random access memory (RAM), and 
may also include a secondary memory 201 0. The secondary memory 201 0 may include, for example, a hard disk drive 
2012 and/or a removable storage drive 2014, representing a floppy disk drive, a magnetic tape drive, an optical disk 

25 drive, etc. The removable storage drive 201 4 reads from and/or writes to a removable storage unit 201 8 in a well known 
manner. Removable storage unit 201 8, represents a floppy disk, magnetic tape, optical disk, etc. which is read by and 
written to by removable storage drive 201 4. As will be appreciated, the removable storage unit 201 8 includes a computer 
usable storage medium having stored therein computer software and/or data. One or more of the above described 
memories can store results produced in embodiments of the present invention, for example, results stored in Tables 

30 300 and 500, and determined coarse and fine pitch periods, as discussed above. 

[0120] In alternative implementations, secondary memory 201 0 may include other similar means for allowing com- 
puter programs or other instructions to be loaded into computer system 2000. Such means may include, for example, 
a removable storage unit 2022 and an interface 2020. Examples of such means may include a program cartridge and 
cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM, or 

35 PROM) and associated socket, and other removable storage units 2022 and interfaces 2020 which allow software and 
data to be transferred from the removable storage unit 2022 to computer system 2000. 

[0121] Computer system 2000 may also include a communications interface 2024. Communications interface 2024 
allows software and data to be transferred between computer system 2000 and external devices. Examples of com- 
munications interface 2024 may include a modem, a network interface (such as an Ethernet card), a communications 

40 port, a PCMCIA slot and card, etc. Software and data transferred via communications interface 2024 are in the form 
of signals 2028 which may be electronic, electromagnetic, optical or other signals capable of being received by com- 
munications interface 2024. These signals 2028 are provided to communications interface 2024 via a communications 
path 2026. Communications path 2026 carries signals 2028 and may be implemented using wire or cable, fiber optics, 
a phone line, a cellular phone link, an RF link and other communications channels. Examples of signals that may be 

45 transferred over interface 2024 include: signals and/or parameters to be coded and/or decoded such as speech and/ 
or audio signals and bit stream representations of such signals; and any signals/parameters resulting from the encoding 
and decoding of speech and/or audio signals. 

[01 22] In this document, the terms "computer program medium" and "computer usable medium" are used to generally 
refer to media such as removable storage drive 2014, a hard disk installed in hard disk drive 201 2, and signals 2028. 

50 These computer program products are means for providing software to computer system 2000. 

[0123] Computer programs (also called computer control logic) are stored in main memory 2008 and/or secondary 
memory 2010. Also, decoded speech frames, filtered speech frames, filter parameters such as filter coefficients and 
gains, and so on, may all be stored in the above-mentioned memories. Computer programs may also be received via 
communications interface 2024. Such computer programs, when executed, enable the computer system 2000 to im- 

55 plement the present invention as discussed herein. In particular, the computer programs, when executed, enable the 
processor 2004 to implement the processes of the present invention, such as Algorithms A1-A4, A1'-A4\ and the 
methods illustrated in FIGs. 2, 7-12, and 14, for example. Accordingly, such computer programs represent controllers 
of the computer system 2000. By way of example, in the embodiments of the invention, the processes/methods per- 
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formed by signal processing blocks of quantizers and/or inverse quantizers can be performed by computer control 
logic. Where the invention is implemented using software., the software may be stored in a computer program product 
and loaded into computer system 2000 using removable storage drive 201 4 : hard drive 2012 or communications in- 
terface 2024. 

[0124] In another embodiment features of the invention are implemented primarily in hardware using, for example : 
hardware components such as Application Specific Integrated Circuits (ASICs) and gate arrays. Implementation of a 
hardware state machine so as to perform the functions described herein will also be apparent to persons skilled in the 
relevant art(s). 

9. Conclusion 

[0125] While various embodiments of the present invention have been described above, it should be understood 
that they have been presented by way of example, and not limitation. It will be apparent to persons skilled in the relevant 
art that various changes in form and detail can be made therein without departing from the spirit and scope of the 
invention. 

[0126] The present invention has been described above with the aid of functional building blocks and method steps 
illustrating the performance of specified functions and relationships thereof. The boundaries of these functional building 
blocks and method steps have been arbitrarily defined herein for the convenience of the description. Alternate bound- 
aries can be defined so long as the specified functions and relationships thereof are appropriately performed. Also, 
the order of method steps may be rearranged. Any such alternate boundaries are thus within the scope of the claimed 
invention. One skilled in the art will recognize that these functional building blocks can be implemented by firmware., 
discrete components, application specific integrated circuits, processors executing appropriate software and the like 
or any combination thereof. Thus, the breadth and scope of the present invention should not be limited by any of the 
above-described exemplary embodiments, but should be defined only in accordance with the following claims. 



Claims 



1 . A method of determining a pitch period of an audio signal using a correlation-based signal derived from the audio 
signal, the correlation-based signal including known peaks, each of the peaks corresponding to a respective one 
of known time lags, the known peaks including a global maximum peak, comprising: 

(a) determining if a candidate peak among the local peaks exceeds a peak threshold; 

(b) determining if a candidate time lag corresponding to the candidate peak is within a predetermined range 
of at least one integer sub-multiple of the time lag corresponding to the global maximum peak; and 

(c) setting the pitch period equal to the candidate time lag when the determinations of both steps (a) and (b) 
are true. 

2. The method of claim 1 , wherein step (a) comprises determining if the candidate peak among the local peaks (i) 
exceeds the peak threshold, and (ii) is within a predetermined time lag range of a previously determined pitch 
period. 

3. The method of claim 1 or 2, further comprising performing step (a) before step (b). 

4. The method of claim 1 , 2 or 3, further comprising: 

prior to step (a), determining if the candidate time lag is less than the time lag corresponding to the global 
maximum peak; and 

performing steps (a), (b) and (c) only if the candidate time lag is determined to be less than the time lag 
corresponding to the global maximum peak. 

5. The method of any preceding claim, wherein the peak threshold is a fraction of the global maximum peak. 

6. The method of any preceding claim, wherein the correlation-based signal is a Normalized Correlation Square 
(NCS) signal, and the peaks are peaks of the NCS signal. 

7. The method of any preceding claim, wherein the candidate time lag corresponding to the candidate peak is within 
a predetermined time lag range of a previously determined pitch period of the audio signal 
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8. A method of determining a pitch period of an audio signal based on a correlation-based signal derived from the 
audio signal, the correlation-based signal including known local peaks each corresponding to a respective known 
time lag., the local peaks including a known global maximum local peak : each local peak corresponding to a known 
interpolated peak, comprising: 

(a) receiving a candidate local peak (CLP) indicator having either 

(i) a first indicator value indicating that a CLP exits among the local peaks, the CLP corresponding to 

a time lag within a predetermined range of a previously determined pitch period of the audio signal, 

and 

an interpolated peak exceeding a first peak threshold, or 

(ii) a second indicator value indicating that no CLP exists among the local peaks; 

(b) if the second indicator value is received, then setting the pitch period equal to the time lag corresponding 
to the global maximum local peak; and 

(c) if the first indicator value is received, and if the CLP is the same as the global maximum local peak, then 
setting the pitch period equal to the time lag corresponding to the global maximum local peak. 

9. The method of claim 8, wherein the first indicator includes the time lag corresponding to the CLP 

10. The method of claim 8 or 9, further comprising: 

(d) if the first indicator value is received, and if the CLP is not the same as the global maximum local peak, then 

determining if the time lag corresponding to the CLP is less than the time lag corresponding to the global 
maximum local peak; and 

(e) if the determination of step (d) is true, then 

(e)(i) determining if the CLP exceeds a second peak threshold, and 
(e)(il) if the CLP exceeds the second peak threshold, then 

determining if the time lag corresponding to the CLP is within a predetermined range of at least one 
integer sub-multiple of the time lag corresponding to the global maximum local peak, and 

(e) (iii) if the determinations of both steps (e)(i) and (e)(ii) are true, then 

setting the pitch period equal to the time lag of the CLP. 

11. The method of claim 10 or 11, further comprising: 

performing steps (e)(ii) and (e)(iii) only when the time lag corresponding to the CLP is within a predetermined 
pitch period search range. 

12. The method of claim 10 or 11, wherein step (e) further comprises: 

(e) (iv) if either the determination of step (e)(i) or the determination of step (e)(ii) is false, then 

setting the pitch period equal to the time lag of the global maximum local peak. 

13. The method of claim 10, 11 or 12, further comprising: 

(f) if the determination of step (d) is false, then 

(f) (i) determining if the CLP exceeds a third peak threshold, and 
(f)(ii) if the CLP exceeds the third peak threshold, then 

setting the pitch period equal to the time lag corresponding to the CLP. 

14. The method of claim 13, wherein step (f) further comprises: 

(f)(iii) if the CLP does not exceed the third peak threshold, then 

setting the pitch period equal to the time lag corresponding to the global maximum local peak. 

15. A method of determining a pitch period of an audio signal using a correlation-based signal derived from the audio 
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signal, the correlation-based signal including known peaks at corresponding known time lags, comprising: 

(a) searching the correlation-based signal for a first time lag corresponding to a global maximum interpolated 
peak of the correlation-based signal; 

(b) searching the correlation-based signal for a maximum interpolated peak corresponding to a second time 
lag within a predetermined time lag range of a previously determined pitch period of the audio signal; 

(c) searching the correlation-based signal for a third time lag; and 

(d) selecting as a time lag indicative of the pitch period a preferred one of 

the first time lag if found in step (a) ; 

the second time lag if found in step (b), and 

the third time lag if found in step (c). 

16. The method of claim 15 : wherein step (a) comprises: 

(a)(i) determining a largest interpolated peak and its corresponding interpolated time lag around each of at 
least some of the known peaks; and 

(a)(ii) selecting the global maximum interpolated peak and the corresponding first time lag from among the 
largest interpolated peaks and their corresponding interpolated time lags determined in step (a)(i). 

17. The method of claim 15 or 16, wherein step (c) comprises: 

searching the correlation-based signal using an integer multiple time lag extraction technique. 

18. The method of claim 17, wherein the integer multiple time lag extraction technique includes searching through 
interpolated time lags of the correlation-based signal and checking whether any of the interpolated time lags cor- 
responds to a sufficiently large known peak near every integer multiple of itself. 

19. The method of any of claim 15 to 18, wherein if a second time lag was found in step (b), then step (d) comprises 
selecting the second time lag as the preferred time lag if the second time lag is 

(i) less that the first time lag, and 

(ii) within a predetermined time lag range of an integer sub-multiple of the first time lag. 

20. The method of any of claim 1 5 to 1 9, further comprising: 

(e) selecting a predetermined time lag if none of the first, second and third time lags were found in respective 
steps (a), (b) and (c). 

21. A computer program for determining a pitch period of an audio signal using a correlation -based signal derived 
from the audio signal, the correlation-based signal including known peaks, each of the known peaks corresponding 
to a respective one of known time tags, the known peaks including a global maximum peak, the program, when 
executed by one or more processors, causing the one or more processors to perform the steps of: 

(a) determining if a candidate peak among the local peaks exceeds a peak threshold; 

(b) determining if a candidate time lag corresponding to the candidate peak is within a predetermined range 
of at least one integer sub-multiple of the time lag corresponding to the global maximum peak; and 

(c) setting the pitch period equal to the candidate time lag when the determinations of both steps (a) and (b) 
are true. 



22. The computer program of claim 21 , wherein step (a) comprises determining if the candidate peak among the local 
peaks (i) exceeds the peak threshold, and (ii) is within a predetermined time lag range of a previously determined 
pitch period. 

23. The computer program of claim 21 or 22, wherein the program is adapted to perform step (a) before step (b). 

24. The computer program of claim 21 , 22 or 23, wherein the program is adapted to perform the further steps of: 

prior to step (a), determining if the candidate time lag is less than the time lag corresponding to the global 
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maximum peak; and 

performing steps (a), (b) and (c) only if the candidate time lag is determined to be less than the time lag 
corresponding to the global maximum peak. 

5 25. The computer program of any of claims 21 to 24, wherein the peak threshold is a fraction of the global maximum 
peak. 

26. The computer program of any of claims 21 to 25, wherein the correlation-based signal is a Normalized Correlation 
Square (NCS) signal, and the peaks are peaks of the NCS signal. 

w 

27. The computer program of any of claims 21 to 26, wherein the candidate time lag corresponding to the candidate 
peak is within a predetermined time lag range of a previously determined pitch period of the audio signal. 

28. A computer program for determining a pitch period of an audio signal using a correlation-based signal derived 
15 from the audio signal, the correlation-based signal including known peaks at corresponding known time lags : the 

programs when executed by one or more processors, causing the one or more processors to perform the steps of: 

(a) searching the correlation-based signal for a first time lag corresponding to a global maximum interpolated 
peak of the correlation-based signal; 
20 (b) searching the correlation-based signal for a maximum interpolated peak corresponding to a second time 

lag within a predetermined time lag range of a previously determined pitch period of the audio signal; . 

(c) searching the correlation-based signal for a third time lag; and 

(d) selecting as a time lag indicative of the pitch period a preferred one of 

the first time lag if found in step (a), 
25 the second time lag if found in step (b), and 

the third time lag if found in step (c). 

. 29. The computer program of claim 28, wherein step (a) comprises: 

30 (a)(i) determining a largest interpolated peak and its corresponding interpolated time lag around each of at 

least some of the known peaks; and 

(a)(ii) selecting the global maximum interpolated peak and the corresponding first time lag from among the 
largest interpolated peaks and their corresponding interpolated time lags determined in step (a)(i). 

35 30. The computer program of claim 28 or 29 ; wherein step (c) comprises: 

searching the correlation-based signal using an integer multiple time lag extraction technique. 

31. The computer program of claim 30, wherein the integer multiple time lag extraction technique includes searching 
40 through interpolated time lags of the correlation-based signal and checking whether any of the interpolated time 

lags corresponds to a sufficiently large known peak near every integer multiple of itself. 

32. The computer program of any of claims 28 to 31 , wherein if a second time lag was found in step (b), then step (d) 
comprises selecting the second time lag as the preferred time lag if the second time lag is 

45 

(i) less that the first time lag, and 

(ii) within a predetermined time lag range of an integer sub-multiple of the first time lag. 

33. The computer program of any of claims 28 to 32, wherein the program is adopted to perform the further step of: 

so 

(e) selecting a predetermined time lag if none of the first, second and third time lags were found in respective 
steps (a), (b) and (c). 

34. A computer readable medium carrying the computer program of any of claims 21 to 33. 

55 

35. An apparatus for determining a pitch period of an audio signal using a correlation-based signal derived from the 
audio signal, the correlation-based signal including known peaks at corresponding known time lags, comprising: 
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a first module for searching the correlation-based signal for a first time lag corresponding to a global maximum 
interpolated peak of the correlation-based signal: 

a second module for searching the correlation-based signal for a maximum interpolated peak corresponding 
to a second time lag within a predetermined time lag range of a previously determined pitch period of the audio 
5 signal: 

a third module for searching the correlation-based signal for a third time lag; and 

a fourth module for selecting as a time lag indicative of the pitch period a preferred one of 

the first time lag if found by the first module, 

the second time lag if found by the second module, and 
10 the third time lag if found by the third module. 

36. The apparatus of claim 35 ; wherein the first module is configured to: 

determine a largest interpolated peak and its corresponding interpolated time lag around each of at least some 
15 of the known peaks; and 

select the global maximum interpolated peak and the corresponding first time lag from among the largest 
interpolated peaks and their corresponding interpolated time lags. 

37. The apparatus of claim 35 or 36, wherein the third module is configured to search the correlation-based signal 
using an integer multiple time lag extraction technique. 

38. The apparatus of claim 37, wherein the integer multiple time lag extraction technique includes searching through 
interpolated time lags of the correlation-based signal and checking whether any of the interpolated time lags cor- 
responds to a sufficiently large known peak near every integer multiple of itself. 

39. The apparatus of any of claims 35 to 38, wherein if a second time lag was found by the second module, then the 
fourth module is configured to select the second time lag as the preferred time lag if the second time lag is 

(i) less that the first time lag, and 

(ii) within a predetermined time lag range of an integer sub-multiple of the first time lag. 

40. The apparatus of any of claims 35 to 39, wherein the fourth module is configured to select a predetermined time 
lag if none of the first, second and third time lags were found by the respective first, second and third modules. 

35 
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